Skip to content

feat(arcadedb): add metadata query methods to ArcadeDBDocumentStore#3013

Merged
davidsbatista merged 11 commits intodeepset-ai:mainfrom
ria-19:feat/arcadedb-metadata-methods
Mar 24, 2026
Merged

feat(arcadedb): add metadata query methods to ArcadeDBDocumentStore#3013
davidsbatista merged 11 commits intodeepset-ai:mainfrom
ria-19:feat/arcadedb-metadata-methods

Conversation

@ria-19
Copy link
Copy Markdown
Contributor

@ria-19 ria-19 commented Mar 21, 2026

Related Issues

Proposed Changes:

Implements five metadata query methods for ArcadeDBDocumentStore listed in issue #2980:

  • count_documents_by_filter
  • count_unique_metadata_by_filter
  • get_metadata_fields_info
  • get_metadata_field_min_max
  • get_metadata_field_unique_values

Also adds:

  • _infer_metadata_field_type & _extract_distinct_values static helpers.
  • SCHEMA_SAMPLING_LIMIT class constant (default 1000)

Implementation notes

Schema sampling: get_metadata_fields_info uses LIMIT 1000 via SCHEMA_SAMPLING_LIMIT to prevent the OOM risks and latency issues associated with full-table scans on large stores.

Search term embedding : get_metadata_field_unique_values embeds the search term via _sql_str() rather than positional_params. This is because _command currently sends params as a JSON array, but ArcadeDB's HTTP API
expects a named params map {"key": value} with :key placeholders. No existing method uses positional_params, so this has not caused failures elsewhere.

How did you test it?

Integration tests added for all five methods covering: happy path, no matches, empty filter, empty field list, pagination, and case-insensitive search. All run against real ArcadeDB via the existing Docker service in CI.
Bug Fix: Resolved a FrozenInstanceError in assert_documents_are_equal using dataclasses.replace for document comparison.

hatch run fmt-check        
hatch run test:types       
hatch run test:integration 

Known follow-up items

  • _command positional_params sends a JSON array, but ArcadeDB expects a named map. Will raise as a separate bug issue after further verification.
  • AstraDocumentStore._get_metadata_projection_documents fetches all documents with no limit for schema inference. Will raise as an enhancement for Astra after further verification.

Notes for the reviewer

SQL Constraints: Used SELECT DISTINCT + Python len() for unique counts because the current ArcadeDB SQL parser has limitations with COUNT(DISTINCT ...).
Sampling Default: 1000 is the current default for schema inference; let me know if the team prefers a different threshold.

AI assistance disclaimer

Developed with AI assistance for syntax review and code audit. I authored the underlying logic, verified all implementations against the ArcadeDB HTTP API, and confirmed all tests pass locally.

Checklist

@ria-19 ria-19 requested a review from a team as a code owner March 21, 2026 20:11
@ria-19 ria-19 requested review from julian-risch and removed request for a team March 21, 2026 20:11
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 21, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added integration:arcadedb type:documentation Improvements or additions to documentation labels Mar 21, 2026
@julian-risch julian-risch requested review from davidsbatista and removed request for julian-risch March 23, 2026 08:29
@ria-19
Copy link
Copy Markdown
Contributor Author

ria-19 commented Mar 24, 2026

Thank you @davidsbatista for the help with the Mixin refactoring and the 'meta' prefix fixes. I'm still learning the internal patterns of the repo; I really appreciate the guidance and the polish.

@davidsbatista
Copy link
Copy Markdown
Contributor

Thank you @ria-19 for your contribution, I did a few last adjustments as you noticed.

@davidsbatista davidsbatista merged commit 155d5b5 into deepset-ai:main Mar 24, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:arcadedb type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing operations to the ArcadeDBDocumentStore

3 participants